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PROBLEM TO BE SOLVED: To provide a device and a method for time/pitch 
conversion which can easily vary the pitch and reproduction time of a reproduced 
voice without bringing large-sized constitution nor complexity of processing nor 
spoiling reproduced sound quality. 

SOLUTION: After the spectrum of voice data compressed as frequency data is 
shifted, the data are interpolated and thinned out and reconverted into voice data 
of time-series data. 
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CLAIMS 



[Claim(s)] 

[Claim 1} They are the time / pitch inverter provided in the voice regeneration system which inputs the voice 
data compressed as frequency data, carries out inverse transformation of the voice data compressed as 
frequency data from a frequency domain to a time domain, and obtains the voice data of time series data. In case 
inverse transformation of the voice data compressed as frequency data is carried out from a frequency domain to 
a time domain and the voice data of time series data is obtained A shift means to shift the spectrum of the voice 
data in a frequency domain according to the pitch converted quantity of voice data, and to determine the 
playback frequency of the voice data of time series data, Voice data is interpolated or operated on a curtailed 
schedule to the spectrum in the frequency domain which was shifted by said shift means and obtained. It has 
interpolation/infanticide means which makes the same the number of voice data of the spectrum in the frequency 
domain before and behind a shift with the same bandwidth. The time / pitch inverter characterized by changing 
the pitch of voice data in case inverse transformation of the voice data in the frequency domain obtained with 
said interpolation/infanticide means is carried out to the voice data of time series data. 

[Claim 2] The voice data compressed as frequency data is inputted. They are the time / pitch inverter provided in 
the voice regeneration system which changes into analog voice data the digitized voice data of the time series 
data obtained in the voice data compressed as frequency data by carrying out inverse transformation from a 
frequency domain to a time domain by DAC, and is reproduced. In case inverse transformation of the voice data 
compressed as frequency data is carried out from a frequency domain to a time domain and the voice data of 
time series data is obtained A shift means to shift the spectrum of the voice data in a frequency domain 
according to the playback time amount of playback voice, and to determine the playback frequency of the voice 
data of time series data, Interpolation/infanticide means which interpolates or operates voice data on a curtailed 
schedule to the spectrum in the frequency domain which was shifted by said shift means arid obtained, and makes 
the same the number of voice data of the spectrum in the frequency domain before and behind a shift with the 
same bandwidth, According to the playback time amount of playback voice, a frequency generates an adjustable 
clock signal. It has a clock generation means to supply the generated clock signal to said DAC at least. The time / 
pitch inverter characterized by extending / shortening the playback time amount of voice data in case said DAC 
changes the digitized voice data of time series data into analog voice data based on the clock signal supplied from 
said clock generation means. 

[Claim 3] The voice data compressed as said frequency data is the time / pitch inverter according to claim 1 or 2 
characterized by being stored in the storage media in which the data read-out rate of arbitration is possible. 
[Claim 4] In case the voice data compressed as frequency data is-inputted, inverse transformation of the voice - 
data compressed as frequency data is carried out from a frequency domain to a time domain and the voice data 
of time series data is obtained According to the pitch variation of voice data, the spectrum of the voice data in a 
frequency domain is shifted. Voice data is interpolated or operated on a curtailed schedule to the spectrum in the 
frequency domain which determined the playback frequency of the voice data of time series data, was shifted and 
was obtained. The time / the pitch conversion approach characterized by changing the pitch of voice data in case 
the number of voice data of the spectrum in the frequency domain before and behind a shift is made the same 
with the same bandwidth and inverse transformation of the voice data in the frequency domain obtained by 
interpolation/infanticide is carried out to the voice data of time series data. 



[Claim 5] In case the voice data compressed as frequency data is inputted, inverse transformation of the voice 
data compressed as frequency data is carried out from a frequency domain to a time domain and the voice data 
of time series data is obtained According to the playback time amount of playback voice, the spectrum of the 
voice data in a frequency domain is shifted. Voice data is interpolated or operated on a curtailed schedule to the 
spectrum in the frequency domain which determined the playback frequency of the voice data of time series data, 
was shifted and was obtained. The number of voice data of the spectrum in the frequency domain before and 
behind a shift is made the same with the same bandwidth. According to the playback time amount of playback 
voice, a frequency generates an adjustable clock signal. The digitized voice data of the time series data which 
supplied the generated clock signal to DAC at least, and were obtained from the frequency domain by the inverse 
transformation to a time domain The time / the pitch conversion approach characterized by extending / 
shortening the playback time amount of voice data in case it changes into analog voice data based on the clock 
signal with which said DAC was supplied. 
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DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[Field of the Invention] This invention relates to the time / pitch inverter, and the time / the pitch conversion 
approach of performing the time of playback voice, or pitch conversion; in the system which reproduces the signal 
whose input is not time series data but frequency data. 
[0002] 

[Description of the Prior Art] The technique of pitch conversion is needed for various applications, such as pitch 
controllers, such as speech rate inverters, such as equipment which changes performance time amount, such as 
an effector for pitch conversion of recording, and commercial work, a minutes sound, an interview, and news, and 
karaoke. 

[0003] Conventionally, it is divided roughly into two kinds, processing in a time domain, and processing in a 
frequency domain, as the technique of changing the pitch of voice data. In processing in a time domain, on the 
time-axis, the-wave-like break point occurred and it had appBared bs a jarring noise at the time of -voice playback: 
Since there was no generating of such a break point at processing in a frequency domain compared with this, a 
noise was not generated. However, by media, such as a tape and CD, since voice is recorded as time series data, 
in order to perform pitch conversion in a frequency domain, time amount < — > frequency conversion, such as FFT 
(fast Fourier transform), needed to be performed. However, many had to be calculated to perform FFT and it had 
the fault that the throughput of an arithmetic circuit had to be large. 
[0004] Next, pitch conversion is explained to a detail. 

[0005] The latter technique was used for the system with a demand mainly severe in simple systems, such as key 
control of karaoke, concerning [ the former technique ] the tone quality of a musical instrument etc., although 
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were based on data processing in the thing (b) frequency domain depended on data processing in the (a) time 
domain and it was divided roughly into two kinds as the technique of changing a pitch, as mentioned above. 
[0006] An example of the pitch conversion by the technique of the above (a) is shown in drawing 13 . In 
processing in a time domain, although a rise/down of a pitch are performed by controlling the reproduction speed 
of time series data, as shown in drawing 1 3 , cautions are required for playback time amount to be shortened or 
extended by coincidence. That is, when a pitch is lowered, playback time amount is extended by coincidence, and 
when a pitch is raised on the other hand, playback time amount is shortened by coincidence. Here, playback time 
amount is not changed, but it aims at changing only a pitch, and playback time amount must be the same as it of 
former data. Therefore, when a duplication part surely arises in somewhere when the pitch of former data is 
lowered, and a pitch is raised, the lack part of data will surely arise in somewhere. Since these become 
discontinuous [ the data on time series ], if it reproduces as it is, a noise will occur and tone quality will worsen. 
There is cross fade processing as a technique for avoiding such fault. This processing carries out fade-out of the 
termination of a continuous wave form, when a pitch is lowered, as shown in drawing 14 , it carries out fadeHn of 
the initiation of the following continuous wave form to it and coincidence, and performs cross fade continuation. 
The noise in a node decreases by this. On the other hand, when a pitch is raised, in order to compensate the lack 
part of data, the same data are reproduced twice, and the noise in a node decreases by cross fade continuation 
similarly. However, in this cross fade processing, when the phase of a fade-out sound and a fade-in sound is 
reversed, a good result may be unable to be obtained. Moreover, it was also regarded as questionable that a 
periodic wave occurs in a playback sound. 

[0007] Next, the technique of changing a pitch by processing of the above (b) can perform pitch change easily in 
shifting data on a frequency shaft, as shown in drawing 15 , and the break point on a time-axis does not generate 
it, either. For this reason, compared with the above (a), the tone quality of the description of a playback sound is 
good. However, the voice data outputted from a tape, CD, etc. is time series data, and in order to change this into 
a frequency domain from a time domain, it needs data processing, such as FFT. Although the equipment or the 
systems which mainly consist of an arithmetic circuit and memory, such as DSP (digital signal processor), could 
perform this data processing, many had to be calculated and there was a fault that the throughput of an 
arithmetic circuit had to be large. 

[0008] Next, the time conversion technique of changing the playback time amount of voice data is explained. 
[0009] It is used for the device which calls it a time stretch / compression to perform only compaction of 
playback time amount, and extension, without changing the pitch of a playback sound, and is mainly called speech 
rate conversion and sampler. This applies the:technique of the pitch conversion mentioned above, and is. ' 
realizable. 

[0010] Since it mentioned above and the pitch of a playback sound falls when reproduction speed is made late 
and playback time amount is lengthened, it is operated so that this may be returned to the original pitch using the 
technique of pitch conversion. Thereby, as shown in drawing 1 6 , a pitch remains as it is and can extend only 
playback time amount. What is necessary is just to perform actuation contrary to this for on the other hand 
shortening playback time amount. 

[001 1] when the media which recorded time series data use well until now , such as CD and a music tape , as 
they were be reproduced and a time stretch / compression be performed , the read-out rate from media be made 
"Bdjosta blessing the- equipment" which control- reproduction speedy or reprodtrc 

the technique of give big buffer memory to a system and adjust playback time amount be adopted . However, it 
did not result until the complicated additional equipment and large-scale processing were needed and both could 
be realized easily. 
[0012] 

[Problem(s) to be Solved by the Invention] As explained above, the fault that it was difficult to remove a noise 
from a playback sound certainly even if it performs this processing, and tone quality deteriorated although cross 
fade processing for avoiding the discontinuity of voice data in processing in a time domain is performed was 
caused among the conventional conversion technique of changing the pitch of voice data. In order the processing 

-4- 



which changes voice data into a frequency domain from a time domain is needed in processing in a frequency 
domain on the other hand and to perform this processing, the fault that a large-scale configuration and great time 
amount were needed was caused. 

[0013] Then, this invention is made in view of the above, and the place made into the purpose is to offer the time 
/ pitch inverter, and the time / the pitch conversion approach of changing easily the pitch / playback time 
amount of playback voice, without [ without it causes enlargement of a configuration, and complication of 
processing, and ] spoiling playback tone quality. 
[0014] 

[Means for Solving the Problem] In order to attain the above-mentioned purpose, 1 st means to solve a technical 
problem They are the time / pitch inverter provided in the voice regeneration system which inputs the voice data 
compressed as frequency data, carries out inverse transformation of the voice data compressed as frequency 
data from a frequency domain to a time domain, and obtains the voice data of time series data. In case inverse 
transformation of the voice data compressed as frequency data is carried out from a frequency domain to a time 
domain and the voice data of time series data is obtained A shift means to shift the spectrum of the voice data in 
a frequency domain according to the pitch converted quantity of voice data, and to determine the playback 
frequency of the voice data of time series data, Voice data is interpolated or operated on a curtailed schedule to 
the spectrum in the frequency domain which was shifted by said shift means and obtained. It has 
interpolation/infanticide means which makes the same the number of voice data of the spectrum in the frequency 
domain before and behind a shift with the same bandwidth. In case inverse transformation of the voice data in the 
frequency domain obtained with said interpolation /infanticide means is carried out to the voice data of time series 
data, it is characterized by changing the pitch of voice data. 

[0015] The 2nd means inputs the voice data compressed as frequency data. They are the time / pitch inverter 
provided in the voice regeneration system which changes into analog voice data the digitized voice data of the 
time series data obtained in the voice data compressed as frequency data by carrying out inverse transformation 
from a frequency domain to a time domain by DAC, and is reproduced. In case inverse transformation of the voice 
data compressed as frequency data is carried out from a frequency domain to a time domain and the voice data 
of time series data is obtained A shift means to shift the spectrum of the voice data in a frequency domain 
according to the playback time amount of playback voice, and to determine the playback frequency of the voice 
data of time series data, Interpolation/infanticide means which interpolates or operates voice data on a curtailed 
schedule to the spectrum in the frequency domain which was shifted by said shift means and obtained, and makes 
the isame the number. of voice data of the spectnjm in the frequency domain before and behind a shift with the 
same bandwidth, According to the playback time amount of playback voice, a frequency generates an adjustable 
clock signal. It has a clock generation means to supply the generated clock signal to said DAC at least. In case 
said DAC changes the digitized voice data of time series data into analog voice data based on the clock signal 
supplied from said clock generation means, it is characterized by extending / shortening the playback time 
amount of voice data. 

[0016] Voice data with which the 3rd means was compressed as said frequency data in said 1st or 2nd means is 
characterized by being stored in the storage media in which the data read-out rate of arbitration is possible. 
[0017] The 4th means inputs the voice data compressed as frequency data. In case inverse transformation of the 
^voice data'-compressed^as frequency data is-camed^utvfrom^ fr 

data of time series data is obtained According to the pitch variation of voice data, the spectrum of the voice data 
in a frequency domain is shifted. Voice data is interpolated or operated on a curtailed schedule to the spectrum in 
the frequency domain which determined the playback frequency of the voice data of time series data, was shifted 
and was obtained. In case the number of voice data of the spectrum in the frequency domain before and behind a 
shift is made the same with the same bandwidth and inverse transformation of the voice data in the frequency 
domain obtained by interpolation/infanticide is carried out to the voice data of time series data, it is characterized 
by changing the pitch of voice data. 

[0018] The 5th means inputs the voice data compressed as frequency data. In case inverse transformation of the 



voice data compressed as frequency data is carried out from a frequency domain to a time domain and the voice 
data of time series data is obtained According to the playback time amount of playback voice, the spectrum of 
the voice data in a frequency domain is shifted. Voice data is interpolated or operated on a curtailed schedule to 
the spectrum in the frequency domain which determined the playback frequency of the voice data of time series 
data, was shifted and was obtained. The number of voice data of the spectrum in the frequency domain before 
and behind a shift is made the same with the same bandwidth. According to the playback time amount of playback 
voice, a frequency generates an adjustable clock signal. The digitized voice data of the time series data which 
supplied the generated clock signal to DAC at least, and were obtained from the frequency domain by the inverse 
transformation to a time domain In case it changes into analog voice data based on the clock signal with which 
said DAC was supplied, it is characterized by extending / shortening the playback time amount of voice data. 
[0019] 

[Embodiment of the Invention] Hereafter, 1 operation gestalt of this invention is explained using a drawing. 
[0020] Drawing 1 shows the configuration of an MP3 encoder / decoder including the function of the time / pitch 
inverter concerning 1 operation gestalt of this invention. 

[0021] This operation gestalt explains the pitch conversion at the time of reproducing the compression voice 
compressed by the MP3 method which is one of the MPEG speech compression methods. In addition, since all are 
applicable if voice data is frequency data, besides MP3, even if it is MPEG speech compression methods, such as 
ACC, it can carry out, and especially speech compression is not limited to an MPEG method. Since the 
compression voice data based on MPEG is already recorded as frequency data, it does not have to carry out a 
frequency and time amount conversion like playback of the media which recorded time series data. This point is 
used, spectrum information on a frequency domain is operated only by adding the program of a number step to 
the software which performs the algorithm of filter data processing, without changing most filter data processing 
further performed at the time of decoding of the compression voice data of MPEG, and it is made to realize pitch 
conversion of playback voice easily. 

[0022] In drawing 1 , the MP3 encoder / decoder of this operation gestalt input the voice data which is time 
series data, is equipped with the encoder 1 which carries out compression conversion of this voice data with the 
compression method of MP3 known from the former at the data in a frequency domain, and the decoder 2 which 
carries out inverse transformation of this output to time series data, and is outputted as voice data of time series 
data in response to the output in the frequency domain of this encoder 1, and is constituted. An encoder 1 The 
hybrid filter bank 1 1 and the mental acoustic-sense analyzor 12, The repeat loop -formation 13 and the Huffman : 
coding section ,1 4 which performs Huffman coding processing in response to the output of trie repeat loop 
formation 13, The side information coding section 15 which encodes side information in response to the output of 
the repeat loop formation 13, It has the bit stream formation section 16 which forms a bit stream in response to 
the output of the Huffman coding section 14, and the output of the side information coding section 15. The hybrid 
filter bank 1 1 It has the clinch distorted reduction butterfly section 1 13 with the subband analysis filter bank 1 1 1 
and the adaptation block length MDCT1 12. The mental acoustic-sense analyzor 12 FET (fast Fourier 
transform)! 21 of 256 points, and FFT122 of 1024 points, It has the predictability-ed test section 123, the mental 
acoustic-sense entropy evaluation section 124, and the signal pair mask ratio count section 125, and the repeat 
loop formation 13 is equipped with the nonlinear quantization section 131, the scale-factor count section 132, and 
the buffer controhsection 133, -a nd^is "constituted: - ^%- ; t -nv-^ru- - — c:^^-^^- u : 

[0023] The bit stream analysis section 21 in which a decoder 2 analyzes a bit stream in response to the output in 
the frequency domain of the bit stream formation section 16 of an encoder 1, The scale-factor decryption section 
22 which performs a scale-factor decryption in response to the output of the bit stream analysis section 21 , The 
Huffman table decryption section 23 which performs the Huffman table decryption in response to the output of 
the bit stream analysis section 21, The Huffman coding section 24 which performs Huffman coding in response to 
the output of the bit stream analysis section 21 and the Huffman table decryption section 23, The reverse 
quantization section 25 which performs reverse quantization and acquires spectrum information in response to 
the output of the scale-factor decryption section 22 and the Huffman coding section 24, In response to the 
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output 'of the reverse quantization section 25, the voice data as time series data is reproduced. And it has the 
hybrid filter bank 26 including the shift means and interpolation/infanticide means of performing pitch transform 
processing which serves as the description of this operation gestalt in this renewal process. The clinch distorted 
reduction butterfly section 261 which carries out the butterfly session of the spectrum information from which 
the hybrid filter bank 26 was obtained in the reverse quantization section 25, In response to the output of the 
clinch distorted reduction butterfly section 261, in response to the output of reverse MDCT262 which performs 
an inverse Fourier transform, and reverse MDCT262, it has the.subband composition filter bank 263 which 
performs subband composition, and is constituted. 

[0024] In the hybrid filter bank 26 of a decoder 2, although processing of butterfly session, reverse MDCT, and 
QMF composition is performed, these processings are processed as one collected algorithm by software. 
Moreover, with this algorithm, in order to perform pitch transform processing, in case a shift means performs a 
frequency and time amount conversion first, spectrum information on a frequency domain is shifted, the 
frequency of playback voice is determined, interpolation of the data in a frequency domain or processing of 
infanticide is performed to the spectrum information shifted with interpolation/infanticide means, and the number 
of data is arranged. While changing a pitch, when spectrum information is returned to a time domain by this, it is 
made for playback time amount not to change. 

[0025] Next, they are explained with reference to drawing 3 - drawing 9 about the above-mentioned processing, 
using the sinusoidal data of a frequency domain as shown in drawing 2 as an example. Hereafter, it explains based 
on the result which carried out simulation about the spectrum information on 0-1 6kHz of bands using FFT / 
reverse FFT. The data inputted into reverse FFT are set to a 1kHz sine wave, sampling frequency =32kHz, and 
measurement size =64. 

[0026] In not processing pitch conversion, it comes to show an output sound signal in drawing 3 . The case where 
the pitch of such a sound signal is raised twice is considered. First, the spectrum information shown in drawing 2 
as shown in drawing 4 is shifted so that it may become a twice as many frequency as this. Although the band of 
spectrum information spreads by 32kHz at this time, a band is deleted after carrying out the spreading band to to 
1 6kHz of one half. Thereby, the number of data of a 0-1 6kHz band is set to 32 of one half from 64. If it changes 
into a time domain from a frequency domain in this condition, playback time amount will become short [ 4000 
microseconds shown in drawing 3 to one half ] to 2000 microseconds. In order to avoid this, data are interpolated 
to the spectrum information shown in drawing 4 , and as shown in drawing 5 , before shifting the number of data 
from 32, it increases to 64 of the same number. Interpolation of data is performed. by the linear interpolation 
approach of adding the. data of the midpoint between two data. Thus, after interpolating data and setting a 
measurement size to 64, inverse transformation is carried out to the data in a time domain from a frequency 
domain. Consequently, playback data serve as a sine wave with a frequency of 2kHz, while playback time amount 
has been 4000 microseconds, as shown in drawing 6 . That is, the pitch of sinusoidal data can be raised twice, 
without changing playback time amount. 

[0027] Next, the case where the pitch of the sinusoidal data shown in drawing 2 is lowered to 1/2 is considered. 
In this case, as shown in drawing 7 to the spectrum information shown in drawing 2 , spectrum information is 
shifted so that it may become one half of frequencies. Thereby, the band of spectrum information narrows from 
16kHz to 8kHz. If it changes into a time domain from a frequency domain in this condition, playback time amount 
will become long- E to ^8000 twice as'many microseconds as this j* from 4000 microse con dsv In' order to -a void this, ~ 
data are operated on a curtailed schedule to the spectrum information shown in drawing 7 , and as shown in 
drawing 8 , before shifting the number of data from 64, it reduces to 32 (0~8kHz band) of the same number. 
Infanticide of data is performed by the approach of deleting the data of the midpoint between two data. Thus, 
after operating data on a curtailed schedule and setting a measurement size to 32, inverse transformation is 
carried out to the data in a time domain from a frequency domain. Consequently, playback data serve as a sine 
wave with a frequency of 0.5kHz, while playback time amount has been 4000 microseconds, as shown in drawing 
9 . That is, the pitch of sinusoidal data can be lowered to one half, without changing playback time amount 
[0028] As explained above, in the pitch conversion in the above-mentioned operation gestalt, processing in the 
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accurate frequency domain where a noise is smaller than processing in a time domain Carry out using what is 
recorded as frequency data, such as MP3 and AAC, and in the process of the conversion to time amount from a 
frequency only by adding processing of the number step in software called a frequency shift, and a data 
interpolation/infanticide Making the pitch of playback voice adjustable at arbitration can be realized easily. 
Moreover, since the data of a frequency unit are outputted from the compression storage media on which 
compressed data, such as MP3 and AAC, was recorded, applying a burden to an arithmetic unit like a tape or CD 
by using this by big processing called data conversion from a time domain to a frequency domain is lost. 
Furthermore, since it has not carried out treating with the data of a time domain, the time stretch / compression 
which applied the previous operation gestalt to the degree which becomes without a jarring noise occurring in 
playback voice are explained. 

[0029] Drawing 1 0 is drawing including the function of the time / pitch inverter concerning other operation 
gestalten of this invention showing the configuration of a voice data regenerative apparatus. 

[0030] The storage media 31 to which a voice data regenerative apparatus outputs a compression sound signal in 
drawing 10 , The storage media I/F circuit 32 which receives the compression sound signal outputted from these 
storage media 31, DSP33 which has the function of the encoder 1 shown in drawing 1 , a decoder 2, and a time / 
pitch inverter in response to the output of the storage media I/F circuit 32 (digital signal processor), DAC34 
which changes into an analog signal the digital signal outputted from DSP33, It has the clock speed adjustable 
circuit 35 which generates a clock signal in response to a clock speed setting signal, and the system clock 
generation circuit 36 which generates the clock signal of a system in response to the output of the clock speed 
adjustable circuit 35, and is constituted. 

[0031] In such a configuration, since the read-out places of voice data are the storage media 31, a read-out rate 
becomes arbitrary, and if even the MIPS value (throughput per unit time amount) which decoding of read-out data 
takes is fulfilled, the system clock of DSP33 can be set up freely. Moreover, it has completed only with the 
configuration shown in drawing 10 , and if it is a system aiming only at audio playback, since it is not necessary to 
send the clock of the regular frequencies, such as a sampling frequency, to other circuits, the system clock of 
DAC34 can also be decided freely. That is, if there is no effect in a playback sound, it will not become a problem 
considering the system clock of the system shown in drawing 1 0 itself as adjustable. Moreover, it can perform 
making a system clock adjustable easily. Here, using this description, the pitch of voice data is beforehand 
changed by the approach of a previous operation gestalt, and the actuation which changes only playback time 
amount, without changing the pitch of a playback sound is explained by making adjustable the system clock of the 
. whole .system including DAC34. First,: a time stretch is explained. In the system clock, generation circuit 36, the 
system clock is beforehand set up so that it may be set to one half at the time of normal operation. It can 
perform making a system-wide clock adjustable easily with the device of a frequency divider etc. Moreover, 
although the MIPS value of DSP33 is reduced by half by setting a system clock to one half, it does not become a 
problem especially unless trouble is caused to decoding of input data. The hybrid filter bank 26 is operated to the 
data shown in drawing 2 and drawing 3 by the technique explained with the previous operation gestalt, and in case 
inverse transformation is carried out from a frequency domain to a time domain, the pitch of data is raised twice. 
Thereby, since the system clock given to DAC34 is 1/2 at the time of normal operation consequently, the pitch of 
the playback voice which inverse transformation was carried out and was obtained becomes the same as origin, 
as shown in ^drawing ~1 t Tand playback' time amount js--extended~twrcev^~~^^ — - 

[0032] On the other hand, it becomes the above-mentioned case and reverse, in the system clock generation 
circuit 36, the system clock is set up the twice at the time of normal operation beforehand, and in the case of 
time compression, the hybrid filter bank 26 is operated to the data shown in drawing 2 and drawing 3 by the 
technique explained with the previous operation gestalt, and in case inverse transformation of the data is carried 
out from a frequency domain to a time domain, the pitch of data is lowered to it 1/2. Thereby, since the system 
clock given to DAC34 is the twice at the time of normal operation consequently, the pitch of the playback voice 
which inverse transformation was carried out and was obtained becomes the same as origin, as shown in drawing 
12 , and playback time amount is shortened by 1/2. 
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[0033]' Thus, a time stretch / compression actuation can be realized easily, without reading like before in the case 
of a voice regeneration system including DAC34, and adding a speed regulating device, and big buffer memory and 
memory management equipment to it only by adding the adjustable circuit of the easy system clock for the 
configuration of a previous operation gestalt. that is , in the voice regeneration system which consist of an 
arithmetic circuit drive with the same system clock , and a DAC , it be possible to realize easily the time stretch 
/ compression function of extend or shorten only playback time amount , only change the clock of operation in 
the configuration of the operation gestalt which mentioned the system clock above from the speed of arbitration 
for aim only at voice playback using the ability to be able to consider as adjustable , and fix the pitch of data . 
[0034] 

[Effect of the Invention] Since according to this invention interpolation/infanticide of data are performed and it 
was made to carry out inverse transformation to the voice data of time series data after shifting the spectrum of 
the voice data compressed as frequency data as explained above, the pitch of playback voice can be changed 
easily, without changing playback time amount, moreover, processing of the above-mentioned inverse 
transformation — in addition, since the frequency of the clock signal of operation at the time of changing a 
digitized voice signal into an analog sound signal was changed according to playback time amount, the playback 
time amount of playback voice can be extended / shortened easily, without changing a pitch. 



[Translation done.] 



* NOTICES * 

JPO and NCI PI are not responsible for any 
damages caused by the use of this translation. 

LThis document has been translated by computer. So the translation may not reflect the original precisely. 
2.**** shows the word which can not be translated. 
3.1n the drawings, any words are not translated. 



DESCRIPTION OF DRAWINGS 



[Brief Description of the Drawings] 

[Drawing 1] It is drawing showing the configuration of an MP3 encoder / decoder including the function of the 
time / pitch inverter concerning 1 operation gestalt of this invention. 

[Drawing 2] It is drawing showing an example of the sinusoidal data in a frequency domain. 

[Drawing 3] It is drawing showing the output sound signal corresponding to drawing 2 . 

[Drawing 4] It is drawing showing the sinusoidal data which shifted the frequency of drawing 2 twice. 

[Drawing 5] It is drawing showing the sinusoidal data which interpolated the data of drawing 4 . 

[Drawing 6] It is drawing showing the output sound signal which carried out the pitch up of the sound signal of 

[Drawing 7] It is drawing showing the sinusoidal data which shifted the frequency of drawing 2 to 1/2. 
[Drawing 8] It is drawing showing the sinusoidal data which thinned out and carried out the data of drawing 7 . 
[Drawing 9] It is drawing showing the output sound signal which carried out the pitch down of the sound signal of 
drawing 3 . 

[Drawing 10] It is drawing showing the voice playback structure of a system including the function of the time / 
pjtch inverter concerning other operation gestalten of this invention. 

[Drawing 1 1] It is drawing showing the output sound signal which carried out the time stretch of the sound signal 
of drawing 3 . 
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[Drawing 12] It is drawing showing the output sound signal which carried out time compression of the sound signal 
of drawing 3 . 

[Drawing 13] It is drawing showing the 1 conventional example of pitch conversion of voice data. 
[Drawing 14] It is drawing showing an example of cross fade processing. 

[Drawing 15] It is drawing showing other conventional examples of pitch conversion of voice data. 
[Drawing 1 6] It is drawing showing the 1 conventional technique of the time stretch of voice data. 
[Description of Notations] 

1 Encoder 

2 Decoder 

1 1 26 Hybrid filter bank 

12 Mental Acoustic-Sense Analyzor 

1 3 Repeat Loop Formation 

1 4 Huffman Coding Section 

15 Side Information Coding Section 

1 6 Bit Stream Formation Section 

21 Bit Stream Analysis Section 22 <BR> Scale-Factor Decryption Section 

23 Huffman Table Decryption Section 

24 Huffman Coding Section 

25 Reverse Quantization Section 

1 1 1 SubBand Analysis Filter Bank 

1 1 2 Adaptation Block Length MDCT 

113,261 Clinch distorted reduction butterfly section 
121,122 FFT 

123 Non-Predictability Test Section 

1 24 Mental Acoustic-Sense Entropy Evaluation Section 

1 25 Signal Pair Mask Ratio Count Section 

131 Nonlinear Quantization Section 

132 Scale-Factor Count Section 

133 Buffer Control Section 

262 Reverse MDCT 

263 SubBand Composition Filter Bank . 



[Translation done.] 
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